Antipattern: Use a Generic Attribute Table

Let's create a generic attribute table and observe how it affects a database.

The solution that appeals to some programmers when they need to support variable attributes is to create a second table, storing attributes as rows. See the diagram below showing the two tables.

EAV Entity Relationship

Each row in this attribute table has three columns:

  • The Entity: Typically this is a foreign key to a parent table that has one row per entity.

  • The Attribute: This is simply the name of a column in a conventional table, but we have to identify the attribute on each given row.

  • The Value: Each entity has a value for each of its attributes.

For example, a given bug is an entity we identify by its primary key value 1234. It has an attribute called status. The value of that attribute for bug 1234 is NEW.

This design is called Entity-Attribute-Value (EVA). It’s also sometimes called open schema, schemaless, or name-value pairs.

Let’s create the Issues table and the IssueAttributes table by using the following code.

Creating Issues and IssueAttributes tables

After creating the tables, let’s insert the data as given.

Inserting data into Issues and IssueAttributes tables

Test it yourself in the following playground.

Retrieving data from IssueAttributes

By adding one additional table, we seem to gain the following benefits:

  • Both tables have a few columns.

  • The number of columns doesn’t need to grow to support new attributes.

  • We avoid a clutter of columns that contain null in rows where the attribute is inapplicable.

This appears to be an improved design. However, the simple database structure doesn’t make up for the difficulty of using it.

Querying an attribute#

Imagine that your boss needs to run a report of the bugs reported per day. In a conventional table design, the Issues table would have a simple attribute column such as date_reported. The statement for creating the conventional Issues table is given below:

Creating conventional Issues table

You can insert the data in the Issues table, as follows:

Inserting data into Issues table

To query all bugs with their report dates, your boss could use a simple query like this:

Querying all bugs with their report dates

To get the same information as the previous query using the EAV design, your boss needs to fetch rows from the IssueAttributes table that stores an attribute named by the string date_reported. This query is more verbose but less clear.

Querying date_reported in EAV design

Reconstructing a row#

It’s natural to retrieve a row from the Issues table with all its attributes in columns. We want to fetch an issue in a single row as though it were stored in a conventional table. We would like to view the records like in the following table.

issue_id date_reported status priority description
1234 2009-06-01 NEW HIGH Saving does not work

Because each attribute is stored on a separate row of the IssueAttributes table, retrieving them all as part of a single row requires a JOIN for each attribute. We must know all attributes at the time we write this query. The following query reconstructs the row shown earlier:

Retrieving data as it were stored in a conventional table

We must use OUTER JOIN operations because INNER JOIN operations would cause the query to return no rows if any of the attributes were not present in the IssueAttributes table. As the number of attributes increases, so does the number of JOIN operations, and the cost of this query increases exponentially.

However, if we need a query to support such a high number of JOIN operations that we’re concerned about exceeding the database’s limits, we may have a problem with our database design. It’s common for an EAV design to lead to this problem.

Synopsis: Entity-Attribute-Value
Supporting Data Integrity
Mark as Completed
Report an Issue